Skip to content

Conversation

@Byron
Copy link
Member

@Byron Byron commented Jun 22, 2025

Tasks

  • report
  • timesheets
  • release as discussion and on Reddit

For the discussion:

gitoxide

@Byron
Copy link
Member Author

Byron commented Jun 22, 2025

I don't know why, @EliahKagan, but Windows seems to fail flakily much more often now. Probably it's a timing change, with the runners getting slower or faster. And I am thinking that just because not much of the relevant code has changed in the past months. This note is just to see if you have thoughts on this.

@EliahKagan
Copy link
Member

EliahKagan commented Jun 22, 2025

There are two separate problems: a nondeterministic failure that affects a small number of tests and does not usually occur, and new deterministic failures that come in with Git 2.50.0.

The PRNG-related symlink probe failure

The failure in this PR is the old nondeterministic failure. Specifically, the test case that failed in test-fixtures-windows but that was not expected to fail was symlinks_to_directories_are_usable. The failure details can be seen here in the run before the recent force-push:

        FAIL [   0.017s] gix-worktree-state-tests::worktree state::checkout::symlinks_to_directories_are_usable
  stdout ───

    running 1 test
    test state::checkout::symlinks_to_directories_are_usable ... FAILED

    failures:

    failures:
        state::checkout::symlinks_to_directories_are_usable

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 14 filtered out; finished in 0.00s
    
  stderr ───

    thread 'state::checkout::symlinks_to_directories_are_usable' panicked at gix-worktree-state\tests\state\checkout.rs:288:5:
    The probe must detect to be able to generate symlinks

That strongly appears to be due to #1816. That probe currently fails occasionally in all of the functions that perform it, for that reason, and I believe no other reasons for it to fail are known or believed to exist. Rerunning the tests will most likely make that go away. (We can fix issue #1816, as discussed there, and I plan to do so; rerunning tests that fail due to it is only something I recommend doing in the mean time, not forever.)

Examining the step that compares the list of tests currently expected to fail in test-fixtures-windows with the list of tests that did fail reveals that, in that run, symlinks_to_directories_are_usable really was the only failed test that was expected not to have failed. If not for that one test case unexpectedly failing, that CI job would've passed.

The new failures with Git 2.50.0

This information is incomplete, but hopefully it will be of some value and maybe save some time. Edit: This is more fully researched now, and I would consider it confirmed, but it could use more confirmation in the form of local test runs on a GNU/Linux system with Git 2.49.0 versus Git 2.50.0.

Git 2.50.0 was recently released. At the moment, when jobs are run on windows-latest, sometimes a runner image is used that has Git 2.49.0 installed, and sometimes a runner image is used that has Git 2.50.0 installed. I believe this is simply due to the updated image being rolled out gradually and/or due to A/B testing, or something else of the sort. It looks like this upstream repository is currently usually (maybe always) using a runner image with Git 2.49.0, while my fork is usually (maybe always) using a runner image with Git 2.50.0.

To be clear, the ubuntu-latest and macos-latest images will eventually be updated to using Git 2.50.0 as well. I believe that, when the ubuntu-latest image is updated, the full test job--which tests with GIX_TEST_IGNORE_ARCHIVES=1--will likewise be affected, though I have not done any local testing to confirm this. In any case, for now, only the test-fixtures-windows job is affected.

What happens is that quite a lot of new failures occur, but only when fixture scripts are rerun. Furthermore, all the new failures are due to the inability to successfully run the fixture script gix/tests/fixtures/make_rev_spec_parse_repos.sh. The failures look like this (though various tests cases fail this way, not just this one):

FAIL [   0.373s] gix::gix revision::spec::from_bytes::access_blob_through_tree
  stdout ───

    running 1 test
    test revision::spec::from_bytes::access_blob_through_tree ... FAILED

    failures:

    failures:
        revision::spec::from_bytes::access_blob_through_tree

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 324 filtered out; finished in 0.36s
    
  stderr ───
    failed to extract 'tests\fixtures\generated-archives\make_rev_spec_parse_repos.tar': Ignoring archive at 'tests\fixtures\generated-archives\make_rev_spec_parse_repos.tar' as GIX_TEST_IGNORE_ARCHIVES is set.
    stdout: Initialized empty Git repository in D:/a/gitoxide/gitoxide/gix/tests/fixtures/generated-do-not-edit/make_rev_spec_parse_repos/2818959361-windows/blob.prefix/
    dead7b21a85f6dc7a24cbc4bb04a008db70bc04a
    dead9d36640e108d9eb[449](https://github.com/EliahKagan/gitoxide/actions/runs/15799313899/job/44535529961#step:6:450)ed5966fd0c6d4e6b7f
    beefc9be42a87abd92326257a995bf20a24c788f
    beef2b0b99a5a8a36d91ea9ecd766af6352eafd9
    Initialized empty Git repository in D:/a/gitoxide/gitoxide/gix/tests/fixtures/generated-do-not-edit/make_rev_spec_parse_repos/2818959361-windows/blob.bad/

    stderr: fatal: invalid object type "bad"


    thread 'revision::spec::from_bytes::access_blob_through_tree' panicked at gix\tests\gix\revision\spec\from_bytes\mod.rs:137:51:
    called `Result::unwrap()` on an `Err` value: "fixture script of \"C:/Program Files/Git/bin/bash.exe\" \"D:\\\\a\\\\gitoxide\\\\gitoxide\\\\gix\\\\tests\\\\fixtures\\\\make_rev_spec_parse_repos.sh\" failed"
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The command that ran successfully with Git 2.49.0 but errors out with Git 2.50.0 seems to be:

echo xyzfaowcoh | git hash-object -t bad -w --stdin --literally

I distinctlly recall, from the Git mailing list, discussion of a change to make it an error to explicitly create Git objects of unrecognized type. My guess is that this has come in with Git 2.50.0 and that it is the cause of this problem.

However, I am not seeing this in the changelog, and I am not readily finding it. Normally I would keep looking, but there is also a thunderstorm going on right now, where I am, that is likely to cause me temporarily to lose electrical power, internet access, or both. Therefore, I am posting this now, in the hope it may be useful even though not fully researched.

Edit: Yes, this is git/git@65a6a79. The first stable version of Git that includes that change is 2.50.0, and the change intentionally drops support in git for the exact kind of git hash-object command (with an unrecognized string operand to -t) that we are using here.

@Byron Byron marked this pull request as ready for review June 22, 2025 15:57
@Byron
Copy link
Member Author

Byron commented Jun 22, 2025

Thanks for the elaborate reply!

I see that the PRNG related issue is 'the usual one', with more issues on the horizon due to the Git update. It's probably a welcome fix. Maybe a bad object can then be created using a shell script.

@Byron Byron enabled auto-merge June 22, 2025 16:02
@Byron Byron merged commit 391dd04 into main Jun 22, 2025
25 checks passed
@EliahKagan
Copy link
Member

Maybe a bad object can then be created using a shell script.

I think it could, maybe with a technique similar to what I used in the script shown in this comment. That comment can also be viewed publicly in #1915 (comment), where it is "Comment 23."

As noted in #2065 (comment), this problem has begun to affect CI here in this upstream repository; I predict that I will be able to fix the problem of the fixture script being incompatible with Git 2.50.0, but I don't know when I will be able to get to it; and there are some more errors we can expect to start happening on CI.

@Byron Byron deleted the report branch June 27, 2025 02:47
@Byron
Copy link
Member Author

Byron commented Jun 27, 2025

I think it could, maybe with a technique similar to what I used in the script shown in this comment. That comment can also be viewed publicly in #1915 (comment), where it is "Comment 23."

And since the public comment is very long (probably the longest comment I have ever seen on GitHub, maybe anywhere), here is the script in question which creates loose objects using python:

#!/usr/bin/env bash
set -euo pipefail

object_data_file="$1"
tree_entry_name="$2"

# Get the OID and ensure its bucket exists.
object_hash="$(sha1sum -- "$object_data_file" | awk '{ print $1 }')"
bucket=".git/objects/${object_hash:0:2}"
mkdir -p -- "$bucket"

# Create the loose object.
<"$object_data_file" python3 -c '
import sys
import zlib
sys.stdout.buffer.write(zlib.compress(sys.stdin.buffer.read()))
' >"$bucket/${object_hash:2}"

# Create a tree that has the object as an entry.
tree_hash="$({
    printf '100644 %s\0' "$tree_entry_name"
    printf '%s' "$object_hash" | xxd -r -p
} | git hash-object -t tree -w --stdin --literally)"

# Create a commit with that tree, and set the current branch to it.
commit_hash="$(git commit-tree -m 'Initial commit' "$tree_hash")"
branch="$(git symbolic-ref --short HEAD)"
git branch -f -- "$branch" "$commit_hash"

# Show what we have. (Can run `git fsck` afterwards to show the corruption.)
set -x
git log
git ls-tree HEAD

The object-creation part can easily be adapted.

In any case, thanks for your help with that, whenever you get to it, it's much appreciated 🙏.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants